Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not parse URIs during LSP serialization/deserialization #76691

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dibarbet
Copy link
Member

@dibarbet dibarbet commented Jan 9, 2025

Resolves dotnet/vscode-csharp#7638

Related to dotnet/runtime#64707

Problem

System.Uri will throw attempting to parse URIs that have sub-delims in the host name. This is because it does additional host name validation beyond what is defined in the URI RFC spec. See above linked issues for specific examples.

When we get passed these URIs from VSCode, the server crashes because we cannot successfully deserialize the LSP URI string into a System.Uri. We are unable to easily recover from these as it happens before we get to the queue processing.

Solution

The approach I took in this PR is to stop parsing URIs during LSP message deserialization and remove usages of System.Uri from LSP directly. Instead, I created a wrapper type DocumentUri which initially stores just the string representation of the URI (exactly how LSP defines URIs). The System.Uri can be optionally retrieved from this (which will parse it). Only places which need to extract information from the URI should retrieve the System.Uri and must handle failures.
 
This allows URI parsing to be delayed until much later (for example looking at the scheme or file path to find a matching document in the workspace). While we still cannot parse the URI, we can handle the error and provide a misc document instead of crashing the server. These aren't normal files anyway (they don't conform to the file:///<path> format) and would go to misc regardless of if we succeed in parsing it or not.

If at some point the runtime adds a new parsing mode, or we switch to our own URI parser (similar to O#), this still provides value. Deserialization of the URI string and parsing the URI string are two separate logical operations and should not be tied together. Additionally, any runtime improvements here would only be available in .NET 10.

Key areas to look at

  1. DocumentUri
  2. DocumentUriConverter
  3. LspWorkspaceManager
  4. ProtocolConversions

Razor PR - https://github.com/dotnet/razor/pull/11390/files

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Jan 9, 2025
@dibarbet dibarbet force-pushed the uri_crash branch 9 times, most recently from a6bf829 to e0e73bb Compare January 14, 2025 00:28
/// TODO: document.
/// Converts the LSP spec URI string into our custom wrapper for URI strings.
/// We do not convert directly to <see cref="System.Uri"/> as it is unable to handle
/// certain valid RFC spec URIs. We do not want serialization / deserialization to fail if we cannot parse the URI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider linking to the runtime issue.

// Valid URI, but System.Uri cannot parse it.
[InlineData(true, "perforce://@=1454483/some/file/here/source.cs")]
[InlineData(false, "perforce://@=1454483/some/file/here/source.cs")]
public async Task TestOpenDocumentWithInvalidUri(bool mutatingLspWorkspace, string uriString)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new tests for these URIs

// If either of the URIs cannot be parsed, we'll compare the original URI strings.
if (otherUri.ParsedUri is null || this.ParsedUri is null)
{
return this.UriString == otherUri.UriString;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be false, right? because we did this exact check above, and would have returned 'true' if it succeeded. so why not return false here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that should definitely return false. good catch

/// In order to gracefully handle these issues, we defer the parsing of the URI until someone
/// actually asks for it (and can handle the failure).
/// </remarks>
internal sealed class DocumentUri : IEquatable<DocumentUri>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main new wrapper type

/// </summary>
internal class DocumentUriConverter : JsonConverter<Uri>
internal class DocumentUriConverter : JsonConverter<DocumentUri>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new serializer for new wrapper type

private class LspUriComparer : IEqualityComparer<Uri>
{
public static readonly LspUriComparer Instance = new();
public bool Equals(Uri? x, Uri? y)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to the new type

@@ -112,7 +62,7 @@ public int GetHashCode(Uri obj)
/// the URI.
/// <para/> Access to this is guaranteed to be serial by the <see cref="RequestExecutionQueue{RequestContextType}"/>
/// </summary>
private ImmutableDictionary<Uri, (SourceText Text, string LanguageId)> _trackedDocuments = ImmutableDictionary<Uri, (SourceText, string)>.Empty.WithComparers(LspUriComparer.Instance);
private ImmutableDictionary<DocumentUri, (SourceText Text, string LanguageId)> _trackedDocuments = ImmutableDictionary<DocumentUri, (SourceText SourceText, string LanguageId)>.Empty;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest of this file is interesting - contains changes to attempt to parse and handle issues

@dibarbet dibarbet marked this pull request as ready for review January 14, 2025 23:53
@dibarbet dibarbet requested a review from a team as a code owner January 14, 2025 23:53

namespace Microsoft.CodeAnalysis.ExternalAccess.Razor;

internal static class RazorUri
{
[Obsolete("Use RazorUri.GetUriFromFilePath instead")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Shouldn't this tell people to use CreateAbsoluteDocumentUri?

Although, if changing to DocumentUri is a breaking change for Razor, is there a reason to keep this obsolete at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So changing to documenturi doesn't seem to break non-cohosted razor. I'm not as sure about this one

if (this.ParsedUri is null)
{
// We can't do anything better than the uri string hash code if we cannot parse the URI.
return this.UriString.GetHashCode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why GetHashCode shouldn't only look at UriString, and let the Equals method do the rest.

It's perhaps not a real world problem so take this with a grain of salt, but I think GetHashCode looking at the parsed Uri means a dictionary/hashset operation can have different results for the same Uri, depending on whether someone happens to have called ParsedUri yet. If GetHashCode is simpler, then things would collide, and the Equals method would save the day.

Potentially broken example (untested, obv):

var uri = "http://goo";
var documentUri1 = new DocumentUri(uri);
var documentUri2 = new DocumentUri(uri);
documentUri2.GetRequiredParsedUri();

var dict = new Dictionary<DocumentUri, object>();
dict.Add(documentUri1, new GiantObject());
Assert.True(dict.ContainsKey(documentUri2));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing ParsedUri will cause the URI to attempt to be parsed (which gethashcode does). So 1 and 2 should have the same hash code (but I will verify as well).

If GetHashCode is simpler, then things would collide, and the Equals method would save the day.

In this case, the issue is two different URI strings (with different string hashcodes) should be considered equal. For example an encoded vs. unencoded URI that point to the same file path should be considered equal (as System.Uri does).

There'd be no collision here because the hashcodes would be different, hence the need to look at both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I think maybe I read the null check on ParsedUri and forgot that would mean it would cause a parse.

@jasonmalinowski jasonmalinowski self-requested a review January 15, 2025 02:09
@dibarbet
Copy link
Member Author

As for the pause here - I need to update XAML with these changes, but have been working on infra. Will get back to this shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead VSCode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] System.Uri cannot parse certain URIs
4 participants